---
layout: post
title: Good list of NN training methods
date: '2015-03-15T16:27:00.000-07:00'
author: Alex
tags:
- Machine Learning
- Neural Networks
modified_time: '2015-03-30T13:09:51.992-07:00'
blogger_id: tag:blogger.com,1999:blog-307916792578626510.post-5188535836322744029
blogger_orig_url: http://brilliantlywrong.blogspot.com/2015/03/good-list-of-nn-trainin-methods.html
---

<p>
    Dived deeper into the methods of training NNs.
</p>
<p>
    Good yet incomplete list of what people did in this area is given in
    <a href="http://jmlr.csail.mit.edu/papers/volume7/castillo06a/castillo06a.pdf">this article</a> dated 2006.
</p>
<p>
    Unfortunately there is no my favourite Rprop and it's modifications (IRprop+, IRprop-).
</p>
<p>
    Also I recently spent some time on experiments with neural networks, and I decided to improve IRprop by keeping track about not
    simply moving along each axis, but as well along directions<br/>$w_i + w_j$, $w_i - w_j$, being sure that this
    should speed up training progress.<br/>
    It increased the speed for the first time, but very fast it stops
    decreasing loss function and when it is close to the minimal value, serious oscillations start
    and the optimization process becomes simply unstable.
    This method is implemented as experimental IRprop* trainer in `hep_ml`.
</p>
<p>
    <strong>Update:</strong> this old post with a link from 2006 is quite obsolete and inappropriate for those interested in training deep networks,
    Instead, please have a look at this <a href="http://sebastianruder.com/optimizing-gradient-descent/">overview</a>
    of methods for adaptive stochastic gradient optimization.
</p>